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! Abstract 

o 

This paper introduces the novel class of modulated cyclostationary processes, a class of non-stationary 
processes exhibiting frequency coupling, and proposes a method of their estimation from repeated trials. 



Cyclostationary processes also exhibit frequency correlation but have Loeve spectra whose support lies 
only on parallel lines in the dual-frequency plane. Such extremely sparse structure does not adequately 
represent many biological processes. Thus, we propose a model that, in the time domain, modulates the 
covariance of cyclostationary processes and consequently broadens their frequency support in the dual- 
frequency plane. The spectra and the cross-coherence of the proposed modulated cyclostationary process 
are first estimated using multitaper methods. A shrinkage procedure is then applied to each trial-specific 
■ estimate to reduce the estimation risk. 

in 

' Multiple trials of each series are observed. When combining information across trials, we carefully 

take into account the bias that may be introduced by phase misalignment and the fact that the Loeve 

\o : 

spectra and cross-coherence across replicates may only be "similar" - but not necessarily identical 



across replicates. The application of the inference methods developed for the modulated cyclostationary 
model to EEG data also demonstrates that the proposed model captures statistically significant cross- 



' frequency interactions, that ought to be further examined by neuroscientists. 
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I. Introduction 

The primary contribution of this paper is a novel model for nonstationary time series that rigorously 
explains a particular dependence structure in the spectral domain that current models cannot handle. 
This model is inspired by an electroencephalography (EEG) data set that exhibits coupling between 
frequency bands. We develop a time-domain representation of the model and derive the corresponding 
Loeve spectrum that captures the desired features. Methods for estimating such features are proposed, 
and challenges inherent in the estimation discussed. 

Nonstationary processes are ubiquitous in practical applications, we mention for example vibratory 
signals [1], [2], free-drifting oceanic instruments [3], and various neuroscientific applications [4], [5]. 
Under stationarity, there are several procedures available for obtaining mean-squared consistent estimates 
of the spectrum. When the assumption of stationarity is rendered invalid, this implies that the second- 
order structure of the process is usually more complicated. For example, in the case of harmonizable 
processes, the spectrum is now defined on a dual-frequency plane (— i, \) x (—\, ^) - in contrast to just 
the interval {—\,\) used in the stationary case. Thus, alternative assumptions need to be made to enable 
estimation, because it is generally not possible to estimate the Loeve spectrum at the 0(T 2 ) pairs of the 
fundamental Fourier frequencies from time series of length T without imposing additional constraints. 

Many models for non-stationary processes have been developed. In classical signal processing, the 
evolutionary behavior of the spectrum is modeled assuming substantial smoothness in the form of the 
time-variation of the process, see e.g., [6], [7]. These models form the backbone of using the popular 
windowed Fourier analysis which implicitly assumes that the signal is approximately stationary within 
the time window of analysis and that the spectrum of the time series is also smooth within the same 
window. If the smoothness assumption is not suitable, then replicates of the series are necessary to 
enable estimation. Unfortunately, in order for replication to be successfully utilized in the estimation, we 
must have time sample replicates that exhibit both perfect time alignment and exactly the same form of 
nonstationarity across replicates. 

In this paper, we propose an analysis framework that is valid for replicated time series (which is 
fairly typical in designed experiments) where we repeatedly observe phenomena that are "similar" across 
replicates but not necessarily identical, and the model must include various realistic features of the 
replication. First, there may be phase-shifts between the replicates which can complicate the estimation 
of the Loeve spectrum (or the coherency). Second, the dependence between a fixed pair of frequencies 
(or frequency bands) may not perfectly replicate at each trial. Hence, our model must account for this 
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variation and our estimation methods must have the ability to recover this variation. In Section [TTJ we 
shall discuss second order modeling of a time series and then focus on frequency correlation descriptions 
of the signal in Section III-Ai 

In order to fully understand the underlying stochastic process that produces the observed time series, 
it is important to develop time domain models that yield the features or shapes that we observe in the 
frequency domain. In analyzing our electronencephalogram (EEG) data, we observe spectral domain 
structures that are in parts reminiscent of existing models - e.g. cyclostationary models [8], [9], [10] - 
e.g. that have a baseline frequency of importance and where harmonics with this period may correlate. 
However, the structures that we observe do not exactly fit the current models because they show a 
significant spread in the dual-frequency plane. Thus, motivated by the results from a preliminary analysis 
of the EEGs, we introduce a time domain class of processes that may reproduce such structure. In Section 
III-Q we develop the proposed modulated cyclostationary process and discuss its relationship to modulated 
oscillations, see e.g. Voelcker [11]. Properties of the replication are explicitly discussed in Section Hl-DI 

Guided by the features of spread in dual-frequency and the variation and similarity across replicated 
trials, we develop an appropriate algorithm for estimation in Section [TlTJ First, as suggested by the data, 
we assume some smoothness of the covariance of the process in frequency to form a replicate-specific 
estimate of the Loeve spectrum. We shall also examine the validity of this assumption in Section IIII-Ai 
One possibility would be to compute the average spectrum across the replicates after denoising (testing 
for non-zero contribution) and then perform other linear analysis steps across replicates. These steps are 
described and discussed in Section IIII-BI 

It is critical to understand the family of Loeve coherence functions (or matrices sampled at a set of 
frequencies and collected in a vector). To explain and recover the possible structures present in these 
matrices, we could perform a singular value decomposition directly on the estimated matrices derived 
from each replicate. However, it is not a good idea to apply simple linear methods to the raw Loeve 
spectra because of phase shifts between replicates that will act antagonistically, as is discussed explicitly 
in Section ITlI-Ci Instead we threshold each replicate-specific estimated Loeve spectrum on its magnitude. 
We take account of the Hermitian symmetry of the spectrum as redundant tests makes the testing procedure 
deteriorate and additionally remove the frequencies that are known to contain unimportant neuroscientific 
information. Very few of the singular vectors are important to the family of matrices and we can recognize 
a family of possible structures. This provides a satisfactory description of the variability in frequency 
across the replicates. Finally, to arrive at a good understanding of a "population" of "similar" processes 
across replicates (rather than a trial-specific process), we must carefully consider the sequential ordering 
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of the replicates. As the true underlying Loeve coherence might have evolved over the course of the 
experiment, we anticipate the estimated Loeve coherence to also follow a similar behavior. Moreover, it 
is reasonable to assume that the true Loeve coherence is slowly changing throughout the duration of the 
experiment and thus we anticipate the Loeve coherence to be similar between "neighboring" replicates. 
We discuss this structure and use clustering methods to determine its form. 

We conclude by analyzing neuroscientific data with the characteristic of the models developed in this 
paper. Our choice of EEG data example motivated our development of the model-class of modulated 
cyclostationary process. EEGs have been important tools in clinical research to investigate underlying 
brain processes. This particular visual motor EEG data set has been analyzed by others using classical 
spectral methods: e.g., Freyermuth et al. (2010) developed a wavelets mixed effects model to estimate 
both the condition-specific spectra and variation in the brain responses across trials; and Fiecas et al. 
(2011) explored the brain effective connectivity structure in a network of 12 channels using generalized 
shrinkage methods for estimating classical partial coherence. Here, we demonstrate that the previous 
analyses missed very interesting oscillatory features - specifically the coupling between the alpha and 
beta oscillations - that we believe ought to be investigated by neuroscientists. We conclude by discussing 
how to encompass additional observed features in the model and put forth possible physical interpretations 
of the observed phenomena. 

II. Second-Order Modelling 

We shall discuss the analysis of the covariance of a zero-mean time series X t . We shall therefore 
model the dual-time autocovariance function [12] of X t given by 

s(t, I) = Cov [XuXt-t] = E [X t X t _i\ . (1) 

If X t is second-order stationary then s(t, I) takes the simpler form ?(!/!). It is convenient [13] to represent 
X t in the frequency domain and one can write down the Cramer representation to be 

X t = j\e* v (i2itft)dZ{f), (2) 

where {dZ(f)} is a zero-mean orthogonal increments process with variance Vai{dZ(f)} = S(f,f)df 
if the process has a smooth spectrum. If we define the Fourier transform of to be S(f), then 

for stationary processes S(f,f) = S(f). The total "energy" or variance of Xt admits a representation 
of integrating S(f, f) over all /. Therefore, if a given frequency / has a relatively large value of 
S(f, /) associated with it, then we say that oscillations of frequency / dominate the signal. Furthermore 
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the autocovariance sequence of a stationary process is a Fourier transform pair with the spectrum. 
Therefore the spectrum fully represents a Gaussian time series because series are fully specified by 
their autocovariance sequence. While useful in a general setting and the focus of several studies, the 
stationarity assumption is not valid for many forms of data. We now outline how the representation of a 
stationary covariance can be generalized to a larger class of processes that can capture realistic features. 

A. Dual-frequency representation of Covariance 

As noted, EEG signals are characterized in terms of their frequency content. While the spectrum 
already tells us a great deal about the presence of oscillatory patterns and behavior, it does not capture 
other interesting and more complex oscillatory properties of these signals, such as the dependence of 
the coefficients across the Fourier frequencies. In general, for non-stationary signals, such correlations 
are present, even if impossible for stationary signals and are important to characterize. It is therefore 
necessary to define a measure of the degree of correlation between the increment process at two different 
frequencies. For this purpose, we use the Loeve coherence (a measure of the covariance) using the 
formalism of harmonizable processes [14]. A time series X t belongs to the harmonizable class if it has 
the Cramer-Loeve representation 



Xt=l i exp(2TTift)dZ(f), (3) 

2 

where the increment random process dZ(f) still has zero-mean but also satisfies 

Cov[dZ(/i),dZ(/ 2 )] = E[dZ(f 1 )dZ*(fa)} = S(fa,fa)df 1 dfa, 

where dZ*(f) denotes the complex-conjugate of dZ(f) and S(fa, fa) is a complex-valued scalar quantity 
called the Loeve spectrum [12]. It may appear that Equation Q strongly resembles Equation © in form, 
but the introduction of the correlations between frequencies significantly modifies the appearance of 
realizations {X t }. The Loeve spectrum easily relates to the primary quantity of the auto-covariance of 
Xt from the following relationship 

s(t, I) = cov[X t , X t _{\ = f 2 f 2 S(f u fa)e 2i ^-M t+ M dfl d f 2 . (4 ) 



Thus, the covariance of X t with X t -i across time t is still decomposed to contributions across frequency. 
However, in general, we also need to consider the cross-dependencies between different frequencies. 
The Loeve coherency at the pair of frequencies (fa, fa), denoted r(/i, fa), is defined to be 

r(fa,fa) = 7^==S= f f =P 1/2 (h,f2)e-^\ (5) 

V^Ul) fa) 
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where the Loeve coherence at (/i,/2) is p(/i,/2) = f2)\ 2 , and the Loeve coherency phase is 

<K/i> /2)- The Loeve coherence values lie in the range [0, 1] where values closer to 1 indicate a stronger 
linear dependence between the random coefficients dZ(f\) and dZ(f 2 ). It can also be interpreted as the 
proportion variance of dZ(fi) that is explained by a linear relationship to dZ(f 2 ), e.g. 

dZ(f 1 )=TdZ(f 2 ) + s(f 1 ,f 2 ). (6) 

where e(/i, f 2 ) is uncorrected with dZ(f 2 ). When the Loeve coherence is close to 1, one interpretation 
is that increased oscillatory activity at the f\ frequency band is strongly associated with an excitation (or 
an inhibition) of oscillatory activity at the f 2 band. When |t| < 1 then the interpretation of Equation © 
is that the variance of dZ(fi) is "partially explained" by dZ(f 2 ). Some care must be applied, because 
the complete relationship between two complex- valued quantities may be widely-linear rather than linear, 
see. e.g. [12, Eqn. 12], and the full set of relationships takes the (widely linear) form of 

dZ{h) = rdZ(fr) + (dZ*(f 2 ) + s(h,f 2 ), (7) 

where e(fi,f 2 ) is both uncorrected to and has zero relation with dZ(f 2 ). Because dZ*(f 2 ) = dZ(—f 2 ) 
we can take this extra information into account by considering the correlation between the full set of 
frequencies (e.g. also positive frequencies with negative frequencies) when analyzing dZ(f), rather than 
calculating the Loeve coherence of dZ(f) and dZ*(f) and the complementary Loeve coherence over a 
restricted set of frequencies^. 

Having extended the classical spectrum to the Loeve spectrum (with particular characteristic features) 
we will now need to develop inference methods for this scenario. There are two natural sets of assumptions 
one can make concerning the Loeve spectrum. First, one can assume S(fi, f 2 ) to be a smooth function 
of f\ and f 2 and adapt statistical procedures from non-parametric function estimation. Second, one 
can impose some form of sparsity on the support of S(fi,f 2 ) - e.g., assume that the support of 
S(f\,f 2 ) is only a set of lines in the dual-frequency domain [15], [16] which is the case if the process 
is cyclostationary [9], [10]. Of course, our model assumptions should be as realistic as possible and 
must closely match the data being analyzed. Our motivation for these methods are features encountered 
in neuroscience, but we envision their application to be more generic to processes exhibiting strong 
harmonics such as (bio-)acoustic signals. In Figure [TJ we see these features where the raw non-parametric 
estimates of the Loeve spectrum with limited smoothing have periodic components that display time- 
variation and some "broadening" which is a deviation from crisply-defined spectral lines obtained from 

'Please see [12] for a more complete discussion of the generalized spectral coherence. 



March 7, 2013 



DRAFT 



STATISTICAL SCIENCE RESEARCH REPORT 316 




-30 -20 -10 



10 20 30 




-30 -20 -10 10 20 30 




-30 -20 -10 



10 20 30 



Fig. 1. Each plot is the square root of the Loeve coherence averaged across separate blocks of 20 trials. The x and y axes are 
for frequencies in the range (—30,30) Hertz. Plot (a) corresponds to trials 1-20; Plot (b) corresponds to trials 41-60; Plot (c) 
corresponds to trials 81-100. These patterns are highly replicable across the aggregate trials. 



the class of cyclostationary processes. Some care should be taken when interpreting these figures, as 
when averaging over trials we are getting "sample characteristics" over the full mixture existing in the 
population of matrices over the population, a topic we shall revisit. 

B. Brief Review of Cyclostationary Processes 

We first discuss the model structures that will allow us to describe correlations in frequency like in 
Figure Q] Firstly, there is structure consistent with support on lines parallel to the diagonal. The simplest 
such process is a cyclostationary process (see e.g. [9]). The covariance of cyclostationary process repeats 
perfectly in time over a cyclical period of D > 0, say. In fact the Loeve spectrum of a cyclostationary 

process takes the form (see e.g. [8], [17]) for some integer value C > 1 

c 

S(fi,f2) = £ S c (f 2 )5 (/! -h - ^) , (8) 

c=-C 

where {S c (f)} are 2C + 1 functions defined for / £ [—1/2,1/2], with respective inverse Fourier 
transforms s c (l). The sum in © must be symmetric in c as the observed process X t is real-valued. 
This Loeve spectrum is equivalent to a time domain covariance of 

s(t,l) = J\J\ f S(f + vJ)e 2 ^ t+ Mdvdf = f\ S c (f)e 2 ^ t+ f l Uf = £ s c (l)e 2 ™^, (9) 

2 2 f 2 C= — C C= — C 

where q c (t, I) = s c {l)e 2l7T ~D t and each contribution to the covariance {s c (t, I)} has associated period D/c, 
and where each contribution represents a line in the Loeve spectrum, spaced at c/D from the diagonal. 
To understand how line components aggregate in the representation, consider the example 

X t = et + a(f')cos(2irf't + Mf'))£t + a(2f')cos(2Tr2ft + MV'))et (10) 
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which is a twice-modulated stationary process e t - Then, the Loeve spectrum of X t has contributions at 
fx = fi an d fx — fi = ±c/'. If the spectrum of et, denoted S e (f,f), exhibits a peak at /', then this 
peak generates a whole structure from the modulation of several peaks. See Figure 0a), which illustrates 
how one peak in the stationary process is propagated onto several points by the modulation, where the 
<S c (/i, /') are determined by the spectrum of et and the amplitudes a(f') and phases 4>{f) and so on. The 
linear mechanism of Equation (fTOt mimics the behaviour of non-linear systems of coupled oscillations, 
see for example the full discussion in [18, pp. 25]. However, while the cyclostationary model allows us to 
understand the mechanisms of the second order structure of non-linear models with coupled oscillations, 
and describes the "skeleton" of the features present in the EEG data, it is not sufficient to describe the 
covariance properties of the observed data in more detail, as evidenced from Figure Qa) and (c) showing 
individual trial estimated Loeve coherences. 

C. Modulated Cyclostationary Processes 

We now develop the novel class of modulated cyclostationary processes. In doing so, we wish to 
retain the assumption of simplicity of the Loeve spectrum but do not constrain the true geometry of the 

spectrum to live on the exact lines as in cyclostationary process. Hence, we relax Equation (© to 

c c 

s(t,l) = *&Q= E *c(t)s c {l)e 2Mc ' D . (11) 

c=-C c=-C 

This covariance is nearly identical to Equation © but now each line contribution in the covariance has 
been modulated by the time-varying amplitude a c (t). 

This is very similar to the notion of uniformly modulated processes; see e.g. [18] but acting on the 
components of the covariance sequence of the process, each component being shifted in frequency by 
c/D. Therefore, unlike uniformly modulated processes, here we allow each extra line to be modulated 
differently. The combination of these actions is a generalization of the uniformly modulated processes. 
The reason for this is two-fold. First, we are modulating each spectral correlation component {s c (l)} 
using a different function, and so our process cannot necessarily be written in the form X t = a(t)rj(t) 
where r]{t) is a cyclostationary process. Second, even if each s c (Z) is modulated in an identical manner, 
the corresponding r](t) is not stationary but cyclostationary. 

Some care also needs to be used in designing the modulating functions {a c (t)}. The diagonal structure 
in the Loeve spectrum is preserved as broadened lines only if a c (t) is varying more slowly compared to the 
oscillation e 2lntc / D _ Otherwise, the diagonal lines will appear as "shifted" in frequency. One option would 
be to constrain the support of the Fourier transform of a c (t) so that it is sufficiently well within —c/D 
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and c/D. This constraint is often encountered in signal processing when modeling an amplitude varying 
oscillation giving rise to a signal that is strongly related to the types of processes that we wish to describe, 
see e.g. the common constraints put on amplitudes in Bedrosian's theorem [19]. In addition to amplitude 
modulation, it would also be natural in some scenarios to include frequency modulation and replace a c (t) 
by a c (t)e llpc<yt ' > in Equation (fTTI ). Care would have to be taken to keep the average modulation moderate, 
e.g. J a 2 (t)ip' c (t)dt/ f a 2 {t')dt' ps 0, as otherwise the process changes characteristics. We investigate this 
further in the examples in Section JV] 

We find that the Loeve spectrum is then an aggregation of 2C + 1 components taking the form 

S c (f + u,f) = l)e 2MiV+lf) = E s c (l)a c (t)e- 2 ™ tc / D e 2 ^ +l V = S c (f)A c (v - -^)(12) 

t,i t,i 

Comparing Equation © to Equation (fl2l ). we see the difference between the modulated and the standard 
cyclostationary model: the multiplication with the amplitude function a c (t) in Equation (fT2l ) will yield a 
broadening of lines centered at v = c/D, with a potential asymmetry around the line dependent on the 
form of A c {-). See Figure |3jc) for an example of the Loeve spectrum estimated from real data and also a 
synthetic example in Figure |2d) with the associated modulating function a c {t) in Figure |2{c). Note that 
the variance across many near lying oscillations will broaden the populations of oscillations even further, 
a feature which is consistent with our trial averages. From the Heisenberg-Gabor principle (see e.g. [18, 
p. 151]), when a c (t) is more transient in time, then the Loeve spectrum is broader or more dispersed 
in frequency. Thus, if a signal displays a very brief temporal response, then the response will be very 
diffuse in dual-frequency. In addition, as noted, we could introduce frequency modulation to Eqn (PT2T i 
by replacing a c {t) by a c (t)e~ lVc<yt \ as long as |v' c (i)| is constrained not to shift the mean frequency. 

To connect our results more generally with amplitude and frequency modulated signals more strongly, 
let Ut be a zero mean stationary signal and take K/2 to be a positive- valued integer so that 

K/2 

X t = a k (t)e 2lnkt/D U t , (13) 

k=-K/2 

where Ut is a stationary process with autocovariance sequence s(l). Then 

K/2 K/2 

cov{X t ,X t ^} = £ E Mt)e l27Tkt/D a kl (t-l)e~^ k '^/ D ~s(l) 

k=-K/2k'=-K/2 
K K/2+c 

= E E «c+,(t)«„(t-Z)e^ 2 ' r ( c +^/ - 27rv ^/ )S(/). 

c=-Kv=-K/2+c 
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For transparency of expression we investigate a simplified case and let a&(i) = aku(t) for k = 
—K/2, . . . , K/2, and this implies that the form of the covariance takes the simpler form of 

K K/2+c 

cov{X t ,X t -t} = u(t)a(t-l)e 2i7TCt / D £ a c+v a v e^ vl l D s{l). 

c=-K v=-K/2+c 

To further simplify the covariance we define an additional function s c (l) and find that 

K K/2+c 

cav{X t ,X t ^} « Yl a 2 (*)e 2i7rct/D s c (/), s c (l) = £ a c+v a v e i2 ™ l / D s(l), 

c=~K v=-K/2+c 

as long as a(t) is not too variable. Thus we see a simple path to approximate generation of signals with 
the desired characteristics. 

D. Replicated Modulated Cyclostationary Processes 

We have proposed a new class of modulated cyclostationary processes. Now, if we repeatedly observe 
signals as realizations of a process, we need to understand the precise notion of replication in the data 
generating scheme. For each replicate r, we shall assume that the length of the time series is T and 
denote the random vector for the r-th replicate to be X( r ) = (^Xi + ( r _i^ T . . . X t t\ ■ It is reasonable 
to assume the probability density function of X( r ) to be from some mixture 

M 

p (X«) = Y ^ {rm) Prm (XW) , (14) 
m=l 

that is, the distribution is a mixture between M different states, and where naturally Ylim ^ rm ^ = 1 
for any replicate r. As a first step, we shall assume that the M densities {p rm (•)} m Equation ([141 
are Gaussian. However, we shall be a bit careful in modeling other aspects of p rm (•). In many related 
problems one would usually take p rm (•) = p m (•). However, this constraint is too stringent and would 
fail under non-stationarity because time-shifts and phase-shifts fundamentally change the distribution 
between replicates, and some r-dependence must be inserted back into the model. Here, both a replicate- 
specific overall time-shift 5^ and between component phase shifts will be needed. We shall also 
permit amplitude modulation per component given by 9i r \ We can think of 5^ as "synchronizing" the 
whole waveform observed in X( r ), while (fic "synchronizes" the harmonic components present in X^ r \ 
vis-a-vis each other. 

Then using the newly introduced parameters, we model the common covariance in the replicated r 

when mixture m only is present as 

c m c m 

s w (t,i) = ^c m) (tj)= ^ r) 4 m) (*-^ (r) )4 m) (0 e2i7rte/jD+#r> > (is) 

c=-C m c=-C m 
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Fig. 2. Plot (a) illustrates that by giving a S contribution to the spectrum at frequency /' = ±1/6 the cyclostationary 
multiplication leads to a plethora of contributions across the dual-frequency plane. Plot (b) is the average of cross Loeve 
coherences for the real data over 20 trials. Plot (c) is the within trial modulation function for a simulated example. Plot (d) is 
the average cross-coherence of a uniformly modulated cyclostationary process over 20 simulated data. 

where by necessity </>q^ = 0, 9c > and G (— 7r, 7t]. This model may seem somewhat clumsy but 
each symbol is directly modulating an aspect of the covariance. With this we find that the covariance in 
frequency is given by 

c m 

S {rm \f + U,f) = 4 m) (/)4 m) (^- ^)fl( r V<^ ) +2^-^)<5 M . (16) 

c=-C m 

We therefore determine that the covariance of the increment random process {dZ^if)} for the r-th 



March 7, 2013 



DRAFT 



STATISTICAL SCIENCE RESEARCH REPORT 316 



12 



replicate is strongly dependent on the replicate-specific phase-shift <jy c r \ the time shift S^ r \ and the 
amplitude modulation 9 c r . The function A c (-) will still cause "blurring" in frequency and instead of the 
6- function localization of Eqn ([8]) we shall observe spread in dual-frequency. For estimation purposes 
especially the phase-shift and time-shift in Eqn (fT6l) are problematic as they cause local variability in 
S(f + v, f) and so local non-parametric estimators that smooth may perform badly, and the additional 
dependence of the time and phase shift are unfortunate. Due to the previously specified locality of A c m \-) 
there will be no substantial overlap in the plane between the different components enumerated by c. We 
can therefore say that 

c m 



E N m) (/) A c 



m). 



-C„ 



D' 



6<?\ 



(17) 



P. 



Note that the variation in magnitude across replicates is now uniquely captured by the constant 9 C 
Absolute coherency for the r-th replicate using component m near contribution c can then be written as 

|S-<™)(/i,/ 2 )| 



P 1/2 (/i,/ 2 ) 



^|S(™)(/i,/i)|S(™>(/ 2) / 2 )| 



si m \f 2 ) AT\h-h 



(to). 



c_ • 



9« 



sirHh] 



1/2 



1/2 



A 



(to). 



C ' 



7 



P l/2 
rmc 



9 (r) 
(fit $2) ^ 1 



1/2 

where /3 mc / 2 ) is modeled in terms of a basis expansion 



pU 2 c(flJ2) 



-"mcp 



(18) 

for each p starting with 



The basis {ip p (fx, / 2 )} is chosen iteratively to maximize the magnitude of c 
p = 1, and so Equation ( fT8T ) in fact defines {t/; p (/i,/ 2 )}. Constructing this representation is generally 
fine as long as the function p 1 ^ 2 (/i,/2) is square integrable. While Pmc(/i,/ 2 ) > 0, this is no longer 
true for the elements in which we do the expansion, e.g. point-wise ip p (fi, / 2 ) may be negative, but this 
should produce no problems. We get that 



Crr 



,V2 



(/l,/2 



a(r) 
7 



C m 00 „(r) 00 

E E7o Cmc ^ (/i ' /2) = ^^ p(/i ' /2)(19) 

=-C m p=l ^0 



c=-C m p c=-C m p=l u p=l 

We therefore obtain that the coherency p 1 ^ 2 (/i, / 2 ) for replicate r can be represented in terms of a basis 
expansion where the weights depend on the trial and the state m. As usual {ip p / 2 )} p are chosen to 
be orthonormal. 
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III. Estimation 

Having developed a non-parametric model that captures the structures we expect to see in the data, we 
now develop a procedure for estimating these structures. This is in contrast to statistics, where attention 
has been focused on very special forms of nonstationary covariance structures, see e.g. [16] and [9]. 
Having proposed a general non-parametric class of models, we now apply a non-parametric estimation 
technique. Our philosophy is that by applying non-parametric estimation methods we avoid the bias that 
is inherently present in parametric models may due to potential omission of features of the data. Ideally 
once such features have been discovered to be significant they would be put to further scrutiny. 

A. Multitaper (MT) Estimation 

Since the covariance structure has been modified by the set of smooth amplitude functions A ™ (•), 
it is reasonable to assume that the Loeve spectrum is smooth. We therefore need a more general non- 
parametric estimator, and we turn to [20] and the usage of multitaper estimators to this purpose. We 
develop a multi-taper procedure for estimating the Loeve coherence from several time-aligned brain 

(k) 

waves recorded from many trials. Denote X[ to be the time series recorded at the r-th trial. Define h t 
to be the k-th orthogonal taper that satisfies ^2 t [h[ h ^] 2 = 1 and ^ h[ k = if k\ ^ &2 (for a 
discussion of tapers and tapering see e.g. [21]). The k-th tapered Fourier coefficient at frequency / is 
then defined to be 

4\f) = Y^hi k) X^xp(-i2nft), /e(~i). 
The k-th Loeve periodogram at the frequency pair (/i, /2) is defined to be 

4 r) (/i,/ 2 ) = 4 r) (/i)4 r) *(/ 2 ), (/i,/ 2 )e(-^)x(-ii) (20) 

The tapered Loeve periodogram estimates can be averaged across tapers to produce a suitable trial-specific 
non-parametric estimator, the Loeve multitaper spectral estimator, 

T (r) (/i,/ 2 ) = ^E4 r) (/i ) / 2 ). (21) 

k 

Note that the expectation of l[ (/i,/2) is not exactly <S*(/i,/2) because of the blurring inherent in 
using tapers. It therefore may seem reasonable to weight the sum in (|2TI ) by the eigenvalues of the 
localization operator that produced the tapers {h[ k ^}, see e.g. [21], but as these are chosen to be so close 
to unity in most cases, using a plain average of the estimates with a weighting of unity makes little or 
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no difference in practice. To obtain an estimator of the "population" Loeve spectrum, we may take the 
average of the trial-specific estimators and can also estimate the coherency in Equation © 

T () (/i,/ 2 ) = ^E /(r) (/i'/2), fM(/ 1 ,/ 2 ) = - ^ r)(/l ' /2) (22) 
R r V/ (r) (/l,/l)/ (r) (/2,/ 2 ) 

Using multi taper methods to estimate the Loeve coherency has already been used in Geophysics [22]. 



Remark. We reiterate that the problem with averaging the Loeve periodogram in frequency and trial 
lies in the variability of the phase (see Equation ([Toll). For a cyclostationary process, we get a Loeve 
spectrum that is quite variable in phase across replicates and its phase exhibits smooth variation along 
parallel lines. The magnitude in contrast does not depend on replicate-specific time initialization of the 
cycles. We believe that over the tapers the estimated phase is stable for a given replicate at a given 
frequency pair but that there is variability in where in the cycle we "start" across replicates. Averaging 
phase dependent quantities in a direction perpendicular to the diagonals across trials therefore makes no 
sense, even if when we average the real and imaginary parts we are producing a magnitude weighted 
average of the phase, which is clearly an improvement. For this reason Equation (1221 is a "less than 
optimal" summary of the population characteristics of the Loeve coherence matrices. 

Note that l[ (fx, f%) € C and its imaginary part is non-zero for most pairs (fx, / 2 ). For a sufficiently 

— (r) —(r) 

large number of tapers K, it is reasonable to assume that 5ft{J. (fx, /2J} and (fx, /2J} are nearly 

jointly Gaussian (see e.g. [23], and [24] for stationary processes). This means that the complex variable 

— (r) 

I. (/i)/2) is nearly complex-Gaussian (see, e.g., [25, p. 39]). Moreover, when K is sufficiently large 

(r) 

and X\ is stationary then we argue in the Appendix that for fx i= fi, 

^(fi,f2)&N c (o,±y K\T^){fx,h)\ 2 &\xl (23) 

B. Multitaper-Singular Value Decomposition (MT-SVD) 

We have R individual replicate-specific estimates of the square-root coherency p r (/i,,f2)- Because 
we need to implement digital processing, we sample the frequency space to {fi, . . . , f^v} = & ", where 
fn = (fx U \ f%) i s tne n " tn P a ^ r °f fundamental frequencies. These frequencies do not necessarily cover 
the spectrum in the Nyquist range and hence we shall discuss what frequencies to include subsequently. 
We construct a vector pW = (^J 2 ^^ . . . ^ 2 (f N )J , and from the full set of vectors we form 

P T =(p« r ,p^ T ), P = U3V" = J>u fe vf . (24) 
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We can therefore chose to take Vk n = V'fc (fn) , and thus expand the matrix in terms of this basis. We 
would recognize similar shapes in the dual-frequency domain by examining {u^}. Now we do not wish 
to implement this procedure straight on the raw matrix P as the sampling characteristics of large matrices 
will make us "learn noise". We shall start by determining which part of the matrix is really non-zero. 

C. Thresholded Multitaper-Singular Value Decomposition (TMT-SVD) 

We first test whether the Loeve spectrum is zero at off-diagonal points (/i, ^2) where /1 7^ /2, as this 
will help us shrink the estimates of the coherence for each replicate, and thus remove spurious non-zero 
coherence. It would be tempting to compare raw coherency across replicates, but we remind the reader 
of the incoherent phase between replicates, as described by Equation (fT6l) . We threshold the individual 
coherency entries by using the distribution of Equation (1231) . implementing a conservative False Discovery 
Rates correction (FDR) [26] to avoid multiple testing issues, but accounting for Hermitian symmetry of 
the data. This produces p( ht ), from which a singular value decomposition can easily be calculated, 
resulting in uj^ (the trial specific vector), f^, ht ^ (the singular values) and vj?*^ (the frequency structure 
vector). We use £^ to determine how many wave-forms we need to keep in, in order to describe most 
of the covariance. We use uj^ as a basis from clustering the trials, using the /c-means method [27]. 

IV. EEG Example 

Overview of the EEG Analysis. Brain patterns operate at given ranges of frequencies (see [13]). Most 
studies concern frequencies corresponding to the (i.) delta band (0-4 Hertz) which appears in adult 
slow wave sleep; (ii.) theta band (4-8 Hertz) which is believed to be associated with the inhibition of 
elicited responses; (iii.) alpha band (8-12 Hertz) which is present when eyes are closed and is also 
associated with inhibition control; (iv.) beta band (12-30 Hertz) which is present during alert and active 
states; (v.) gamma band (30-100 Hertz) which are thought to represent the highly synchronous activity of 
neurons in response to a specific cognitive or motor function. To examine oscillatory properties of single- 
channel brain signals, scientists use spectral analysis methods [28] which decompose the covariance of 
an observed time series according to the independent contributions (weightings or variance) of complex 
exponentials associated with different frequencies. Empirical analyses of our EEG dataset suggest the 
existence of coupling or dependence between coefficients at the alpha (8 — 12 Hertz) and beta (12 — 30 
Hertz) bands. Such dependence should not be ignored as they could turn out to be important biomarkers 
for differentiating between different cognitive processes and mental states. While this concept of cross- 
frequency coherence is potentially powerful, it is not commonly used in practice because of the lack 
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of methods for testing significance of these dependence measures. In this paper, we elucidate on this 
concept of spectral dependence and use this to further interrogate the visual-motor EEG dataset. 

Description of the Data. The EEG data in this paper is recorded from a healthy male college student 
from whom we recorded potentials from the scalp using a 64-channel EEG system (EMS, Biomed, 
Komeuburg, Germany). The electrodes were applied to the scalp using conventional methods arrayed in 
the standard International 10-20 system with two electrodes that served as a ground and a reference. The 
EEG was recorded at 512 Hertz and then band-pass filtered at (0.5,100) Hertz. An additional notch filter 
at 60 Hertz was applied to remove the artifact caused by the electrical power lines. 
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The participant performed a visually-guided hand movement task where he viewed a video monitor, 
placed about 1 meter away, and responded to targets that jumped to the left or right from a central position. 
A target jump, occurring every 1.55 second, instructed the participant to displace the lever of a hand-held 
joystick from a central upright position to realign the visual representation of the joystick orientation 
with the displaced target, either to the right or left of center. He received instructions to start and to move 
quickly and accurately, and to return the joystick to the center position only when the target jumped 
back to the center of the video monitor. We analyzed EEG signals for 138 rightward movements from 
the center position. From the montage of 64 scalp electrodes, we selected the FC3 electrode, presumably 
placed over the prefrontal cortex, which is demonstrated to be implicated in premotor processing [29]. The 
prefrontal cortex, in coordination with the parietal cortex (which is responsible for visual sensation) and 
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the occipital cortex (which plays a key role in visual-motor transformations) are all engaged processing 
and execution of this visually-guided hand motor task. 

Results and Discussion. We produced an estimate of (/i, $2) for each trial r. For illustrative purposes, 
in Figure [3] (a) and (c), we computed T" r > (fx, fe) for trials r = 1 and r = 8. Of course not all non- 
zero Loeve dual-frequency coherence is really statistically significant. For each trial, we thresholded the 
estimated Loeve dual frequency spectra using the marginal distribution of Equation (1231 combined with 
the False Discovery Rates (FDR) procedure with a set rate of 5% [26]. To avoid Hermitian redundancy 
and dependence, the Loeve dual-frequency coherence matrix was subsampled to half its size over the 
frequency interval of interest [—30, 30] Hz before this procedure was applied. Two examples of estimated 
Loeve coherence, derived from the thresholded Loeve spectra, are shown in Figure [3] (b) and (d). We 
see two structures that clearly emerge from the thresholded spectra: one is the presence of "thin" lines 
and the other is the presence of fat "blobs". The thin lines can be explained by the structure of a normal 
cyclostationary model, the fatter structure requires more modulation, like in Example 2. The blobs are not 
an effect of the multi-taper method even if this method essentially smooths the power across frequency: 
if the smearing was only due to a resolution issue then the two types of Loeve spectra would not appear 
in the data. A plausible explanation of the spread is the distribution of oscillations (or spread of power 
across some frequency band) and the temporal nonstationarities of the oscillations' amplitudes within a 
single trial is discussed in Section III-CI 

Before combining information across trials, we first computed the average dual-frequency coherence 
across all trials and plotted the average of the magnitudes in Figure [T] averaged over batches of 20 trials. 
One must be cautious when averaging the complex quantities as phase shifts may average real features 
to zero. There appears to be some homogeneity of the response across trials, and so the distribution of 
magnitude of the coherence averaged over multiple trials is stable and similar (e.g. Figure [TJ subplots 
(a), (b) and (c)). However, these are population results. To see trial-specific characteristics, we refer to 
individual estimates of the Loeve coherence in Figure [3] 

A close examination of the individual estimates suggest that there are two subpopulations to this 
group of Loeve (magnitude) coherence matrices. To further investigate this between-trials structure, we 
performed a singular value decomposition (or principal component analysis) on the magnitudes of the 
thresholded matrices in Figure [3] We plotted the first four principal components in Figure |4] The first 
principal component (PC) is very similar to the averages shown in Figure Q] (a) and thus the first PC 
represents a kind of "average". The second PC in Figure @Ib) can either add to or subtract from the 
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(a) (b) 




Fig. 5. Plot (a): The sparsity of the matrix versus the sparsity across the loadings (recall the weights for each PC is normalized 
to one). Plot (b): The loads for the first three principal components versus trials. Plots (c) and (d): The mean of the two clusters 
of the weights. 

"squares" of Figure Q2a). The third and fourth PC in Figure |4]c) and (d) add additional structure. We 
plot the singular values in Figure 0a). As already noted, the inclusion of the second principal component 
leads to substantially sparser correlations and the sparsity of the matrices is clearly correlated with the 
second weight (Figure |6tb)). We then used the k-means method to cluster the trials using only the first 
two sets of weights. This decision was made empirically by looking at the weights vs series (e.g. Figure 
|7Ja) and (b)). The analysis yielded clearly made up groups, see Figure [6jc) where the sign of the second 
loadings decided the group to which the matrix belongs on trial b. There was a clear trial specific effect 
and we show the labels of groups versus trials in Figure [6jd), where for example cluster two becomes 
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Fig. 6. Plot (a): Singular values. Plot (b): The fraction of non-zero values of a matrix versus its weight to the second principal 
component. Plot (c): The first and second weights with colors superimposed to signify the clustering from the k-mean method. 
Plot (d): The labelling of each state across trials (d). 



less frequent across trials. 

The raw loadings utilized for clustering are displayed in Figure [5jb). Clearly the longer the subject 
waited, the higher the likelihood of diffuse correlation between frequencies, as shown in Figure |5]c) and 
(d). The effect of sparsity on the labels was also clear from Figure |5l a). Red color in the plots corresponds 
to the matrix in Figure I3c), which was the sparser matrix in each trial, but this type of magnitude matrix 
was less sparse across components for some degrees of sparsity (see Figure |5]a)). It was less sparse 
across components because it was not as "empty" as seen from Figure He), but corresponded to the 
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structure with some extremely narrow lines as shown in Figure [3jb). This will require several principal 
components, each of which will be specific to each frequency that the brain can produce within a band. 
The more diffuse structures in Figure |3jd) is the blue color in the plots and is the fatter matrix in Figure 
[3d). These oscillations were more variable within a trial, as discussed due to variable amplitudes within 
the trial, causing the narrow lines to spread. 

In summary, our proposed modulated cyclostationary model captured highly significant and interesting 
interactions between the alpha (8 — 12 Hertz) and beta (12 — 30 Hertz) bands, as clear from Figure 
[3d). These results would not have been obtained from the usual spectral analysis procedures - which 
ignore these types of interactions. Neither does the sparser structure of Figure |5jc) agree with a stationary 
model as negative and positive frequencies are correlated. This correlation corresponds to a synchrony 
of oscillations corresponding to starting at a given time after stimulus. 

Our findings are interesting because they support the general consensus that the alpha and beta 
oscillations play significant individual roles during movement. [30] demonstrate that beta activity is 
closely linked to motor behavior and is generally attenuated during active movements. Moreover, studies 
have shown beta band activity to be significant even during just motor imagery and also for task switching. 
Alpha activity was indicated in [31] to reflect neural activity related to stages of motor response during 
a continuous monitoring task. In fact, in a similar response in beta power, alpha power is reduced at 
several central electrodes during response execution. Further studies demonstrated widespread high alpha 
(10 — 12 Hertz) coherence increase around the primary sensorimotor cortex during response execution, 
inhibition and preparation. While these studies show the contribution of the individual alpha and beta 
activity during movement, our findings suggest the association and some temporal alignment between 
these two oscillations which we submit to the neuroscience community for further interrogation. 

V. Discussion 

Stationarity is a traditional assumption that enables consistent estimation of the covariance of a time 
series without replicated measurements. For this reason much of classical estimation theory relies on this 
assumption. As demonstrated in this paper, there are scenarios when this assumption is unreasonable. 
However, it is unclear what should be done if the assumption is relaxed, especially if there is no sense 
of replication between sets of observation. 

In the absence of replication, a simpler model must be posited so that 0(T 2 ) covariances need not be 
estimated. If we insist on retaining a non-parametric model of covariances, then these must be changing 
slowly, to enable estimation. In contrast, with replication, a higher resolution estimate is feasible. To 
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enable estimation with replication, a model for the form of replication must be constructed. As we 
have investigated in this paper, allowing for any randomness in phase can create destructive interference 
once combining information across replicates. We therefore propose to remove the phase of the Loeve 
coherence to avoid this destructive interference. 

Finally a sensible approach to extracting common features from the replicates must be designed: we 
here chose to formulate a model, which prompted us to use the singular value decomposition. If the entire 
time series data contains more than one population, straight averaging is not going to work even if the 
phase is removed. With straight averaging, the averages between the two populations will be recovered 
instead of a good estimate of each individual population. We show for simulated and real-life data how 
the singular value decomposition can recover the true underlying populations, and help us classify the 
state according to their frequency correlation. Smooth and visually appealing frequency correlation maps 
are produced from such (see Figures |5]c) and (d)). 

There is an underlying philosophical issue from our analysis. Much effort has focused on defining 
time series models that are either stationary, or locally stationary in a traditional sense. However real- 
life data challenge the perception that such is pervasive and the silver bullet for time series applications. 
Traditional nonstationary models, such as the locally stationary model of Priestley [32] and [6], the SLEX 
(local Fourier) model in [33], and the locally stationary wavelet model in [34] are an extremely useful 
addition to the literature, but cannot capture all features of real data, such as correlation of strongly 
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separated frequencies. Our addition of the modulated cyclo stationary process is to introduce a new model 
class capable of capturing frequency correlation, but more flexible and parametric model classes are 
sorely needed. Until such are developed, non-parametric approaches are pivotal to learning additional 
characteristics of the data. Until such are developed so that we may estimate important features in single 
trials, we will need to use repeated trials to highlight and extract important time series characteristics. 
It is important to always investigate such possibilities when analyzing data, or important aspects of the 
data will be missed. This will cause us to not infer the correct generating mechanism of the data, and is 
so very serious. 
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Appendix 

Statistical Properties of the Fourier Transform 

An arbitrary random vector Z which is complex-Gaussian is denoted as Z = Nc(fJt, S, C), which 
means the real and imaginary part are jointly Gaussian and 



E{Z} = nz, var{Z} = E{(Z-/i z )(Z-/x z )"} = 5] z 



(25) 



Rel{Z} 




(26) 
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In the special case of Cz = the distribution is complex-Gaussian proper and is written Z = Nc^p{nz,^z)- 

— (r) 

For /. (/i, / 2 ), we denote 

Sz = sM(/i,/ 2 ) and C z = C (r) (/i,/ 2 ) 

By Isserlis' theorem [35], 

E{4 r) (/i)4 r) *(/2)} = ^ (r) (/i,/ 2 ), E{4 r) (/ 1 )4 r) (/ 2 )} = cM(/ 1 ,/ 2 ) 

var{x( r) (/!)4 r) *(/ 2 )} = ^ ) (/i,/i)^ ) (/ 2 ,/ 2 ) + |cW(/ 1 ,/ 2 )| 2 
Rel{4 r) (/ 1 )4 r) *(/ 2 )} = C7W(/ 1 ,/ 1 )CW*(/ 2 ,/ 2 )+(sW(/ 1 ,/ 2 )) 2 . 

— (r) 

For our convenience we define the variance and relation of I. / 2 ) to be, respectively, 

a2(/i/2) = 5W(/ 1 ,/ 1 )5W(/ a ^) + |i Z W(/ 1 ,/ a )| a 6R+> 

^)(/ 1 ,/ 1 )i?(^(/ 2 ,/ 2 ) + (^)(/ 1 ,/ 2 )) 2 
c/(/i,/ 2 ) = — eC. 

Thus, for an arbitrary harmonizable process, 

T (r) (/i,/ 2 ) = 2< r) (/i,/ 2 ) e -^ r> ^>^ =Nc{s^(f 1 ,f2),a 3 i(fuf2)Mfuf2)) 
where l {r \f 1 ,f 2 ) € M + . If the process X t (r) is stationary then S^{f u f 2 ) = ^ (r) (/i)^(/i - and 



thus, 



7!"(A. A) ^ *c ( ^» - A), S "' (/l f' (g ' r>(/ '^ (/ '- A) ) . (27, 
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